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Abstract: We consider a multivariate density model where we estimate 
the excess mass of the unknown probability density / at a given level u > 
from n i.i.d. observed random variables. This problem has several appli- 
cations such as multimodality testing, density contour clustering, anomaly 
detection, classification and so on. For the first time in the literature we 
estimate the excess mass as an integrated functional of the unknown den- 
sity /. We suggest an estimator and evaluate its rate of convergence, when 
/ belongs to general Besov smoothness classes, for several risk measures. 
A particular care is devoted to implementation and numerical study of 
the studied procedure. It appears that our procedure improves the plug-in 
estimator of the excess mass. 
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1. Introduction 



Let Xi, . . . , Xn be n i.i.d. observations in M , d > 1 having unknown underlying 
distribution function F with probability density /. We want to estimate the 
excess mass of this distribution, at level v > which was defined by (fiol ) as 

where | • | denotes the Lebesgue measure of a set and C{v) = {a; e M'' : f{x) > v} 
is the density level set (at level v) or density contour cluster (see Figured]). 
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Fig 1. Excess mass for a bivariate probability density with two local modes 



Estimating the excess mass has muhiple practical apphcations that we men- 
tion without actually dealing with them. Most applications use differences of 
excess-masses at different levels v in order to test the multimodality of a proba- 
bility distribution. Hartigan and Hartigan ijlll ) introduced the dip-excess mass 
and defined an estimator which allowed to test multimodality. This estimator 
was extensively used in the literature since, see e.g. (fiol). ([1), They insist 
on the fact that such a procedure separates mode estimation from its location. 

Another important application of the excess mass functional is to the esti- 
mation density level sets (or density contour clustering), that is the support 
(the set of points) C(i/) on which the excess mass at level v is calculated. This 
requires a good estimator of the excess mass as well as an optimization pro- 
cedure. Polonik ([21]) proved consistency of such estimators of the density level 
set and found some rates of convergence. Tsybakov (26) gave minimax rates for 
estimating smooth star-shaped level sets of a density. These methods are either 
very difficult to implement or use assumptions which are difficult to check. They 
use a margin assumption quantifying the smoothness of the density / around 
the level as introduced by (17). Later, ([!) used a Bayesian approach and 
((23h revisited the plug-in estimator for this problem. They may claim for com- 
putational feasibility as well as for strong theoretical properties. On the other 
hand, studied and implemented an estimator of the support of a density 
via complexity penalized excess-mass criterion. 

Other applications of excess mass estimation include estimation of regression 
contour clusters (H^), discrimination of locally stationary time series (4) and. 
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via level set estimation, anomaly detection and classification as described by 

(Hi). 

These methods generally avoid using a nonparametric estimator of /. In- 
deed, such an estimator may not be very attractive in higher dimensions d. 
We overpass this difficulty by estimating the excess mass £(1^) as an integrated 
functional of / at fixed level ly > 0, that is 

£(y)= j ^,{f{t))dt for = (|x|-j.)l|,|>,. (1.1) 

Indeed, excess mass estimation is a particular case of estimating integrated func- 
tionals of / of general type: = J $(/), where $ is known. The study of such 
functionals with $ 4-times continuously differentiable is now completed. It was 
noticed since dll), ^ and many others that can be estimated at a parametric 



rate as soon as the Holder smoothness of / is larger or equal to 1/4, but at 
a slower nonparametric rate otherwise. The lower bounds for the nonparamet- 
ric rates case were established in These rates were achieved in a minimax 
setup by wavelet estimation procedure in the paper by (fisl ) and in an adaptive 
to the smoothness setup in the paper by dU (with a loss with respect to the 
minimax rate of the usual logarithmic order). Nemirovski (I20I ) gave asymptot- 
ically efficient estimators for 1 and 2-times continuously differentiable function 
4>. In our problem $ is continuous but not differentiable (when periodized). Our 
approach works for any other integrated functionals with continuous but not 
differentiable $. 

In the particular case of a large enough level the excess mass problem 
is reduced to the estimation of the Li norm. Obviously, this problem has no 
interest in the density model. In the regression model (jlSh studied the problem 
of estimating the Li norm and in the gaussian white noise model estimated 
the Lr norm for r > 1. 

The excess mass estimator we construct in this paper generalizes the esti- 
mator of the Li norm in (flil). Their procedure actually uses a Fourier series 
approximation for the function whose coefficients are known and depend on 
the level i/. As $ is applied to /, so are the functions of the Fourier basis. A 
kernel estimator of / is then plugged-into the functions of the Fourier basis and 
they are multiplied by a factor which actually reduces the bias. This multiplica- 
tive factor is depending on the variance of the kernel estimator in the considered 
model. The integral of this expansion gives the final estimator of the functional. 

In this paper, we consider a density (thus heteroscedastic) model. Neverthe- 
less, the excess mass functional can be defined for the regression and gaussian 
white noise models as well. As we consider the density model, the multiplicative 
factor depends upon a variance which is proportional to the unknown density / 
and another estimator is plugged-into this factor. Moreover, we have a multidi- 
mensional setup fitting better to most applications. For the study of rates, we 
replace the preliminary kernel estimator by a wavelet estimator which allows us 
to compute rates over more general Besov smoothness classes for the unknown 
probability density /. The whole procedure is detailed and fully explained in 
Section [2] 
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In dieD, lower bounds for estimating the Li-norm were given, but upper and 
lower bounds were separated by a logarithmic factor. We show that our estimator 
attains the same rate as the estimator of the Li-norm in (flih. In the gaussian 
white noise model, (3,) improved on the lower bounds for the particular problem 
of excess mass estimation. Nevertheless, a gap still remains between the upper 
bounds we present here and their lower bounds. 

The paper is organized as follows. In Section [21 we present the estimation 
procedure. In Section [31 we state an expansion for the upper bound of the 
expected errors (pointwise, L2 and Loo) of our procedure. Next, we determine 
the optimal parameters of the method and give in Theorem 13.11 the rate our 
procedure achieves. Section [4] is devoted to the empirical study. The proofs are 
postponed to Section [3 

2. Excess mass estimation procedure 

Let us describe the densities / considered in the sequel. We suppose that / is 
compactly supported with known support K = [Ai, Bi] x . . . x [Ad, Bd] C M''. We 
denote, for some m* > 0, J^(IK, to*) the class of compactly supported probability 
densities / : K — > M such that 



is satisfied for some p £]0, 1[. 

The estimation procedure consists of four steps. At the first step we approx- 
imate the functional defined in (|l.ip by its truncated Fourier series with 
known coefficients. Then, at the second step, we estimate the unknown function 
/. In particular, we consider here wavelet estimator but it is possible to consider 
kernel estimators. The third step consists in plugging-into the Fourier series the 
wavelet estimator of / in an unbiased way. Finally, we integrate on K to get the 
estimator £{iy) of £(z/). Let us describe in more details this procedure. 

2.1. Approximation of the functional 

We assumed in (j2.ip that the density of interest / is bounded by some p < 1 
uniformly over the class !F(K,m*). It allows us to define as a function on 
[— 1, 1] to [0, 1] and then, its approximation by Fourier series is given by 



inf /(i) > TO* and ||/||oo < P 



(2.1) 




< Cfe(j/) 



(cos(7rfc-),$^) 



2 



(cos(7rfc) — cos(7rfci/)) 



(2.2) 



(sin(7rfc-), = 0. 
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We insist here on the fact that the procedure appHes for any other integrated 
functional J with known continuous $ when periodized. The values of the 
Fourier coefficients Ck will change, but they will still be bounded by a quantity 
of order for large k and all the proofs work out the same way. 

Let us discuss on the class constraints in ()2.1|) . On the one hand, we assume 
densities / to be uniformly bounded by some constant p < 1. More generally, 
we could have considered a class J-{K, R,m*) (for R > fixed) of probability 
density functions / : IK ^ M such that f{t) > m* > for all t G K and 
such that there exists some < p < R and ||/||oo < P < R- Then, the excess 
mass is defined via the functional <I>i, : [—R,R] — > [Q,R] for any < i/ < R. 
For this functional we consider the rescaled Fourier basis on [—R, R] and the 
corresponding coefficients are 

cqW = ^ , Ck{v) = -^^^ (cos(7rfc) - cos(7rfc — )). 

Therefore, without loss of generality we consider R= 1. On the other hand, we 
ask that the underlying density to be bounded from below away from 0. This 
is a classical assumption in the density model. Indeed, the variance of density 
estimators are proportional to / and it cannot be controlled without such an 
assumption. 



2.2. Estimation of the density f 

We need now a nonparametric estimator of the density /. We can use any 
method and tune the smoothing parameter similarly. We chose the wavelet 
estimator of / in order to deal easier with higher dimensions and to general 
functions in Besov classes. For this purpose, let us be given a pair of scaling 
function (f> and associated wavelet function tp. We assume that these functions 
are compactly supported (of support [0, 2M]); they can be of class C with r as 
large as desired, see for example the Daubechies's wavelets, (@). With tensorial 
product , one can construct a multivariate scaling function and 2*^ — 1 associated 



wavelets always denoted by {0, i/''^}cg{i....,2<i-i}i see (|18D. In the sequel, for any 



function g e L^{R'^),l e e N, we use the notation gjj{.) = 2i'^^^g{2^. ~ I). 

For a given j G N, the set , Vj,,;, , J' > j, {hM) & ^2^,6 e {1, -.2'' - 1}} is 
an orthonormal basis of L'^iW^) and one can write, with the usual notations for 
the projections 

yg^L^M."), Vj>0, g^E,g + D,g (2.3) 

where 

l^Z<i j'> j leZ'' l<e<2''-l 

We omit the spaces where the indices are varying: j, jf are always integers 
and I is always a d— dimensional index. Denote [.J the integer value and define 
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jo = Ljq/J and joo = LJoo'J where Jq and j'^ are such that 

/ n V^"" 

2^0 = logn and 2^- = . (2.4) 

\lognJ 

Taking advantage of the decomposition (|2.3|1 . we propose to estimate / by its 
wavelet estimate at the level j Vctrying between Jq and joo 

f^it)''^'Y.^^Mt) (2.5) 

where the empirical coefficients are defined for any integer j varying between jg 
and joo and for any integer / e {2.^.41 - 2M, 2Wi} x . . . x {2.^71^ - 2M, 2^Bd} 
(we denoted K ^ [Ai, Bi] x . . . x [Ad, Bd]), 



^3,1 - 

n 

1=1 



1 " 
Put 

A|(i) = v^(/,(i)) = - E ( / -^.,'1'^..'=/ - / '^.^■.'i/ / hhf) hiAt)hiAt) 

" h.b ^ J J J 

and observe that 

A^2(0 < ((2M)2'^||<^||L||0||^||/||oo) ^. 

Using (|2.ip . we bound the constant in the right term by 7 = (2M)^''||(^||^. 
Moreover, we need to bound from below the variance Aj(i). Therefore, we choose 
a wavelet such that there exists to > satisfying the assumption 

Vj - Jo . . . , Joo, Vi e K, a;, \c^{2H -l)\>m (2.6) 

where jo, joo are defined in 



2.3. Plug-in 



A candidate to estimate AN^^{f{t)) could be co(i^) + J2k=i ^k{i^) cos(7rfc/j(i)). 
Following (fl^ , this estimator has too large a bias and we decrease this bias by 
considering the following modification. We estimate AN^,j{f{t)) by 



N 



ANj{t) = co(i/) +^Cfe(z/)exp(^2;.2^^.(^)2/2)cos(^fc/j(t)). 



fc=i 



Since the variance of the estimate of the density Xj{t) is unknown, we replace 
it with an estimate based on the empirical moments 



m) = 



n 1 1 



n '■ — ' n 



(2.7) 
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Notice that there exists some constant c > such that A|(t) < c2^^'^n ^ which 
could be much larger than A|(t). We decide then to truncate A|(i) at 
(for 7 = (2Af)2'^||(/i||^) which is the upper bound for \\X'j{-)\\oo- 



2.4- Estimator of the excess mass 



Finally, we propose to estimate £{1^) by 



£iiy) = / AN,j{t)dt (2.8) 

JK 

= ^Ckiv) I exp f — mini A|(t),7 \\ cos (nkfj{t)j dt 

fe=0 \ L J / 

for X'j{t) defined in and 

coii^) = {\-vfl2, Cfe(z.) =2(^fc)-2(cos(7rfc)-cos(^H), 7 = (2Af)''^||0||L . 



3. Upper bounds and convergence properties 



In Proposition 13. 1[ we give bounds from above for the expected errors of the 
estimation procedure of the functional £{y)- This bound is depending on the 
parameters of estimation j and iV and on the wavelet approximation error of /. 
Next, we determine the optimal parameters j and ISS to balance the terms ap- 
pearing in the upper bound of Proposition l3.1l under the additional smoothness 
assumption on the unknown function /. In Theorem 13. 11 an upper bound of or- 
der (nlogn)"*/'-^*"'"'^'' for our estimation problem is found. In the gaussian white 
model nearly minimax lower bounds of order (nlogn)^''/(^''+'^'(logn)~^/^^''+^-' 
(d = 1, at a log factor) were established by (Q)- They improved the techniques 
used by (flil ) who found lower bounds of order (nlogn)~*/*^^*+''^(logn)~^ (d = 1) 
for estimating the Li norm in the Gaussian white noise model, but there is still 
a gap between upper and lower bounds. 



3.1. Expansion of the estimation error 

The following bound for the mean absolute error of estimation of the excess 
mass holds. 

Proposition 3.1. Let j be an integer between Jq and and N a positive 
integer. Assume that f G JF(K, m*). Choosing (j) such that l\2.6} holds for some 
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m > 0, we get 



Ef[d{£,£)) < a 



Ci N 





1 




X 3/2 


V " 








M 


n 1 



(logn)i/2 

1/2- 

exp 



C2 



1/2 



\qsN 



n 



L!7^22^ 
2 n 



for d denoting either i) the point wise difference, i.e. d{g, h) ~ \g{i^) — ^{1^)1 for 
a given v > 0, ii) the sup-norm, or Hi) the normalized L2— norm, i.e. d{g,h) = 
\\g — h\\2/\K\ with |K| denoting the Lebesgue measure of the set K and 

Ci-2|K|, C2^{4MY'/^4TT-\2MyU\\^, C3 = Q - 4^-2|K|. 



3.2. Upper bound for the estimation error 

Let us now tune the parameters N and j in an optimal way. We denote, for 
fixed m* > 0, 

T{m*) = |J{J^(K,m*) : |K| < D}, 

K 

for some fixed constant D > 0. We assume a Besov type smoothness condition 
for / related to the wavelet expansion of the density /. More precisely, let 
p,q > 1, s > and L > 0. The Besov bodies are characterized in term of 
wavelet coefficients as follows 

feblVL)^\\ao,.\\p+ |^E[2^"(^+^-f)||/3,.||,]'j <L. (3.1) 

Note that, for a given r— smooth wavelet with r > s, the Besov norm of a 
function / is equivalent to the sequence norm of the wavelet coefficients of the 
function and then the concepts of Besov body and of Besov spaces are equivalent. 
For further details on the Besov spaces and their links with the wavelet analysis, 
see for instance (fiol ). We derive from the smoothness assumption (|3.ip on the 
unknown function / the following bound for the bias term 

We choose the integer j depending on N such that [2^"] = in order to 
balance the bias and the approximation error. We replace j and minimize next 
the variance terms 

3/2 



+C3N-U—-] )exp' ^ 
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WetakeiV= [(Con log r7.)''/(2''+'i) J , with a constant Co > such that Co tt^ 7/2 < 
min{s,c?/2}/(2s + (i). In this way, the exponential term in the variance term be- 
comes a polynomial term smaller than the bias and the approximation term. 
The latter terms are of the same order: (n log n)~^^ i^s+d) ^ j^q^q that the variance 
term does not drive the rate. The following theorem is then proved. 

Theorem 3.1. Let s > 0, I < p < 00, 1 < q < 00, D, m* > and 

< p < 1. Let us suppose that f belongs to J-{m*) f] h^^^{L) and assume there 
exists m positive such that the technical assumption (jjj. 6]) holds. Let £*{•) be 
the estimate of £{■) defined by i2.5\) - l[27^) for the following choice of estimation 
parameters 

3* - [f'\, V" - (nlogn)^ , TV* = L(Conlogn)^/('^+'')j 

where Cq > is a constant smaller than miii{2s, d}-(TT'^{2M)'^''- {{(^W^ {2s + d)) ^ 
Then 

IW sup {n\ogn)^Ef (d{£*,£)) < C, 

"-°°/e^(m-)n&;_,(L) ^ ^ 

for d denoting either i) the point wise difference, i.e. d{g,h) = \g{i^) — 
for a given v > 0, ii) the sup-norm, or Hi) the normalized 1^2 — norm, i.e. 
d{g,h) ~ \\g — ft,||2/|K| with |K| denoting the Lebesgue measure of the set IK 
and the constant C > depends on s, L, D, d and cj). 

As we already mentioned, the theorem is still valid if we estimate any other 
integrated functional of the type / $(/) with $ continuous not differentiable. 

4. Numerical results 

First, we describe the implementation of our estimation procedure £*{i') at 
level > 0. In order to compare, we also implement a plug-in procedure [v) 
defined as follows 

£^\y) - / {U^) - y)dx. 

i/„(x)-i/>0 

where /„ is a density estimator. Let us recall that the best rate achievable by 
this procedure on the Besov balls (for the same loss functions as in Theorem 

13. ip is the usual nonparametric rate n~ 2s+<J obtained for the tuning parameter 

1 

of order n^s+d . 

We compare several error measurements: in the sequel, E2 and (respec- 
tively E2^ and E^) denote the integrated squared error and the sup- norm due 
to our estimator £* (respectively due to the plug-in estimator £^^). Moreover, 
the probability p2 — Pf{E2 < E2') that the error of our procedure £* be 
smaller than the corresponding error of is a good indicator of the perfor- 
mances of our procedure with respect to the plug-in procedure. Similarly, we 
consider p^o = PfiE^ < E^') 
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In the first part, we explain the automatic algorithm: observe that it is slightly 
different than the procedure described in the theoretical section (with respect 
to adaptation for instance). Next, we give a short summary of our simulation 
results: we try a lot of densities and we present here the most representative 
and relevant examples. 



4.1. Algorithm 

The simulations are performed with the free software R V2.4. For the plug-in 
procedure, the estimator /„ is computed with the data driven procedure called 
density provided by R. This kernel procedure of density estimation determines 
automatically the smoothing parameter h** that we use for the plug-in proce- 
dure. Since the theoretical optimal index for the plug- in procedure is n"^/'-''"'"^'*^ 
we deduce s from h**. Then we modify the smoothing index introducing the 
logarithmic term as indicated in Theorem 13.11 The parameters used when our 
procedure is computed are given by 

N = [{Can log n) \, h = (nlogn)"^^, Cq = d. 

We emphasize that the procedure density () is again used for our ouwn proce- 
dure but with the smoothing index modified as prescribed in Theorem 13.11 A 
bootstrap procedure of 100 replications is introduced to estimate the expected 
value Ef{fn{x)) and the variance — V/(/„(x)) of /„(x). As the bootstrap 
procedure gives a very accurate estimator of the variance, the truncation in the 
exponential term is actually useless for practical purposes and stands in formula 
(|2.8p only for technical reasons in the proof. It is then sufficient to describe the 
estimator in (j2.8|) as 

^ Ck{iy) J exp y—^X^{x)j cos(^TTkfn{x)j dx. 

As explained in the introduction, the exponential factor is a correction of the bias 
introduced when / is plugged-into the cosine function. Therefore, we suggest to 
compute the estimator as 

TV „ 
^*{'^) = y2 ^k{v) / cos [TTkEf{fn{x))) dx, 

where Ef(fn{x)) is very well recovered by a bootstrap estimation procedure. 
In practice, both methods give the same results, but the second formula is 
computed significantly faster than the first. 

A sequence = z/i, . . . , viqq = 1 is considered. The empirical errors denoted 
E2, -Bf^ and E^, E^ are computed via K = 2Q Monte Carlo simulations. We 
denote p2 and Pao the frequencies of success of our procedure. 
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Fig 2. Set of studied densities, (a): standard gaussian; (b): mixture of gaussian and uniform; 
(c): mixture of 2 gaussian and laplace; (d):mixture of gaussian with isolated spoke. 



4-2. Univariate densities 

We consider some example of probability densities which are mixtures of gaus- 
sian, uniform and Laplace laws, see Figure [2] The density (a) 

is the gaussian density: this is the most standard example and it is very popular 
in practical studies. The density (b) 

f{x) = 0.8 • fM(-i,o.7){x) + 0.2 • fuiia){x) 

is a mixture of a gaussian density and a uniform density. Remark that the 
uniform density is not continuous and this allows us to study the robustness of 
our procedure. Moreover, the gaussian part is very smooth: the mixture density 
is then difficult to estimate because there is a conflict about a global choice of 
the bandwidth. The density (c) 

f{x) = 0.3 • /A/-(-i,o.5)(a;) + 0.3 • f^f{l.5s){x) + 0.4 • fc{fi){x) 

is a mixture of a gaussian density and Laplace density. Since the Laplace density 
is not differentiable at its mode, we study again the same phenomenon: the 
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Table 1 

Univariate densities. Comparison of €* and E^^ in mean integrated squared error and in 
mean error of the sup-norm, over K = 20 Monte-Carlo simulations, for various sizes of 
samples n = 100, 1000, 10000. 



/ 


n 




El 


1^2 


P2 




Elo 


E^ 1^1^ 


Poo 


a 


100 


0.00504 


0.00542 


0.93 


0.45 


0.04450 


0.04855 


0.92 


0.45 


a 


1000 


0.00079 


0.00066 


1.19 


0.70 


0.01937 


0.01765 


1.10 


0.75 


a 


10000 


0.00008 


0.00006 


1.32 


0.55 


0.00602 


0.00590 


1.02 


0.60 


b 


100 


0.00354 


0.00533 


0.66 


0.20 


0.03742 


0.04989 


0.75 


0.30 


b 


1000 


0.00147 


0.00086 


1.71 


0.90 


0.03217 


0.02133 


1.51 


0.95 


b 


10000 


0.00170 


0.00083 


2.06 


1.00 


0.03645 


0.02445 


1.49 


1.00 


c 


100 


0.00520 


0.01027 


0.51 


0.15 


0.04132 


0.04924 


0.84 


0.30 


c 


1000 


0.00077 


0.00036 


2.17 


0.80 


0.01745 


0.01274 


1.37 


0.80 


c 


10000 


0.00075 


0.00021 


3.64 


1.00 


0.01714 


0.00938 


1.83 


1.00 


d 


100 


0.03271 


0.01857 


1.76 


1.00 


0.11473 


0.08293 


1.38 


1.00 


d 


1000 


0.00975 


0.00346 


2.82 


1.00 


0.05985 


0.03606 


1.66 


1.00 


d 


10000 


0.00248 


0.00063 


3.91 


1.00 


0.02975 


0.01525 


1.95 


1.00 



smoothing indices can not be at the same time globally designed and everywhere 
optimal. The last density (d) 

f{x) = 0.5 • /AA(-i.5,o.4)(a;) + 0.05 • /v'(-o.8,o.i)(a;) + 0.45 • f^(i^a.8){x) 

is a mixture of three gaussian densities with isolated peaks and different vari- 
ances. Density (d) is a case where the smoothing indices of the estimation pro- 
cedures have to be space-dependant in view to capture the small sharp peak. 

One challenge is to check whether our procedure overcomes all the enumer- 
ated difficulties for the estimation of the density /. The results are presented in 
Table [H 

First, we note that our procedure is becoming more accurate when the size 
of the sample increases. It seems that our method is relatively complicated and 
need enough data to be powerful. In the opposite, the naive plug-in method 
is a robust procedure which is not so bad when few data are available: when 
n = 100, the frequencies of success of the plug-in method with respect to our 
procedure is 1 — p = 0.55,0.80,0.85 for the density (a), the density (b) and the 
density (c). But when n is larger, our method is more successful: p = 0.90, 0.80 
for the density (b), the density (c). 

When the densities become more and more complex (by complex, we mean an 
increase of the number of modes or irregularities in the density) , our procedure 
is much more relevant than the plug-in procedure. For large samples, n = 10000, 
we observe a benefit of 106% for a mixture of gaussian and uniform densities, a 
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benefit of 264% for a mixture of gaussian and Laplace and a benefit of 291% for 
a mixture of densities with a small isolated peak. In parallel, we observe that, 
for all the Monte Carlo simulations, our estimator £* is systematically better 
than with a rate p2 = Poo — 1- 

Observe that for the density (d), our method is better than the plug-in esti- 
mator for any sample size. It seems that the change of the smoothing parameter 
adding an extra logarithmic term is crucial to kill a great part of the bias term. 

4-3. Bivariate densities 

In this part, we focus on gaussian and uniform densities. Let us denote 

Af{{EX, EY), i^nX), VW), Pxy)) 

the bivariate gaussian density of (X,Y). The studied densities are plotted in 
Figure m The density (A) is the standard one 

/ = /a/'((0,0), (1,1,0))- 




Fig 3. Set of studied 2D densities. (A): 2D Gaussian. (B): mixture of 2D gaussian and 
uniform. (C): mixture of two 2D gaussian. (D): Mixture of three 2D gaussian. 
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Table 2 

Bivariate densities. Comparison of £* and S^^ in mean integrated squared error and in 
mean error of the sup-norm, over K = 20 Monte-Carlo simulations, for various sizes of 
samples n = 400, 1000, 10000. 



/ 


n 




E* 


^2' 1^2 


P2 






Eoo/E^ 


Poo 


A 


400 


0.01685 


0.00870 


1.94 


0.95 


0.07435 


0.05675 


1.31 


0.95 


A 


1000 


0.00948 


0.00394 


2.41 


1.00 


0.05445 


0.03525 


1.54 


1.00 


A 


10000 


0.00635 


0.00263 


2.41 


1.00 


0.04524 


0.02867 


1.58 


1.00 


B 


400 


0.10397 


0.04628 


2.25 


1.00 


0.17667 


0.12818 


1.38 


1.00 


B 


1000 


0.07460 


0.02943 


2.53 


1.00 


0.15398 


0.10660 


1.44 


1.00 


B 


10000 


0.04747 


0.01985 


2.39 


1.00 


0.12894 


0.08901 


1.45 


1.00 


C 


400 


0.01184 


0.00555 


2.13 


1.00 


0.06442 


0.04609 


1.40 


1.00 


C 


1000 


0.00906 


0.00432 


2.09 


1.00 


0.05462 


0.03717 


1.47 


1.00 


c 


10000 


0.00379 


0.00134 


2.83 


1.00 


0.03566 


0.02081 


1.71 


1.00 


D 


400 


0.26965 


0.26187 


1.03 


0.55 


0.34953 


0.34350 


1.02 


0.55 


D 


1000 


0.25655 


0.24609 


1.04 


0.85 


0.33525 


0.32628 


1.03 


0.85 


D 


10000 


0.23705 


0.22277 


1.06 


1.00 


0.31686 


0.30497 


1.04 


1.00 



The density (B) is a mixture of the Gaussian density and the uniform density 

/ = 0.6 • /V((_l,0), (0.7,0. 7,0)) + 0.4 • /l/([o.5,1.5]x [-0.5,0.5]) 

and Density (C) 

/ = 0.8 • /a/-((-0.5,0.5),(1,1,0)) + 0-2 • /W((0. 4,-0. 4), (1,1,0)) 

is a mixture of two gaussian densities. Last, the density (D) 

/ = 0.45 • /v((o,o), (1.5, 1,0. 95)) + 0-45 • /a/'((o,o), (1.5, 1,-0. 95)) 
+0.10 • /a/-((o, -1.2), (0.2, 0.2,0)) 

is a mixture of three gaussian densities presenting an isolated spot. 

Table [2] presents results for different sample sizes n = 400,1000,10000. As 
in the one dimensional case, we observe that the improvement increases with 
n and with the complexity of the underlying theoretical density increases. We 
observe that our procedure is always better than the plug-in procedure. The 
frequencies of success of our procedure are very high: p2 = poo = 0.55 for the 
densities (A), (B), (C). Even in the case of the density (D) where the results 
are mitigated, the worse result is p2 = poo — 0.55 when n = 400 (which is very 
small for 2— dimensional non parametric estimation problems). 

The improvements are more remarkable in bivariate case (but for the den- 
sity (D)). We may say that the gain of a logarithmic factor in the rate of our 
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estimator gives a significant compensation for the cmse of dimensionality. For 
the standard gaussian density, we observe a high improvement of our estima- 
tion procedure compared to the plug-in estimator as the dimension increases: 
for n = 10000, the empirical mean squared error has improved from 32% in the 
case of the density (a) to 141% in the case of the density (A). When empirical 
sup-norm is considered, the improvements are not so extraordinary but there 
are significant: from 2% to 58%. 

We think that the mediocrity of the results in the case of the density (D) 
could be corrected by a more accurate determination of the smoothing indices. 



5. Proofs 



5. 1 . Proof of Proposition \3.1\ 

In order to study the quadratic error, it is useful to note that the hypothesis 
(j2.ip for some p < 1 implies that is zero if > 1. Let us summarize again 
some notation 

• - -!y)l|t|>„ 

• AN^,y{t) = co(i^) + Ylk=i cos(7rfci), 

+ Efci ck{v) exp(7r2fc2A^.(i)V2) cos(^fc/,(t)), 
. = coH-fEf=iCfcMexp(^max{A2(t),72il|j cos(^fc/^ (i)), 

• the unknown density / writes on the wavelet basis f = Ej f + Djf where 

E,f = E and D,f = E E E f^hi^f,i- 

Then £{v) — J^ANj{t,h')dt. We have the following expansion 

£{i^) - £ijy) = + S2M + B2{v) + A{v) + Bx{v) 

A^j{t,v) - AN,j{t,v)\dt+ / [AN,j{t,v) - E{ANj{t,v))]dt 

[E(AN,,{t, v)) - AN<^,{Ejf{t))] dt 

[AN^,{Ejf{t)) - 'l>,{Ejf{t))]dt 

[^,{E,f{t))-^Mmdt 

where Si^v), S2{v) are stochastic terms, B2{v) is a bias term due to the plug in, 
A{v) is an approximation term and Bi{v) is a bias term due to the estimation 
of the function of interest /. Proposition 13.11 is proved combining (|5.ip . ([57 
(lOl) . (I57l) and dSH). 
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5.1.1. Bias term (due to the estimation). 

The bias term is bounded using the fact that |a(t)+ — < \a{t) — h{t)\ 

iBiMI < / \E,f{t) - .f{t)\ dt < \\D,f\\^. (5.1) 
Note that the same bound holds for ||i?i||oo and for ||Bi||2/|K|. 

5.1.2. Approximation term. 

Using the values of the Fourier coefficients given in (|2.2[) , we have the following 
approximation for any N , 

yu G hi, 1], |$,(u) - AAr$,(u)| <^J2t^^ 

implying that 



k>N 



\A{u)\ < \A^ME,f{t))-'t>^{E,fmdt<^. (5.2) 
Note that the same bound holds for ||A||oo and for ||yl||2/|K|. 

5.1.3. Bias term (due to the plug-in). 

First, we state the following lemma proved in the next section. 

Lemma 5.1. Let N be a positive integer, f belongs to J-(K,m*) and fj be the 
wavelet estimator constructed in 12.5]) . For k — I, . . . , N and j varying between 
jo and joc , we have 



Vi e K, 

where 



E 



^7^^k^\]{t)/2 cos(7rfc/j(t))j -cosinkEjfit)) 



n 



for an universal positive constant A. 
Applying Lemma [531 we get 



\B2{y)\ < / \EAN,j{t,v) ^ AN^u{E,f{t))\dt 



< 

fe=i 



V|cfc(j.)| / e^''='^'(*)/2i?(cos(^fc/,(i)))-cos(7rfci?,/(t)) 
fe=i 

< |K| Vk(z.)|sup( sup e-''='^'(*)/0 . 

k=i \k=l,...,N J 



dt 
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Taking 



23d 4 ('2?dli\ 



1/2 



we obtain 



Note that the same bound holds for ||i?2||oo and for ||S2||2/|IK|. 
5.1. 4-. Stochastic term. 

The wavelet estimator fj{t) at point t = (ti, . . . ,trf) is depending on the ob- 
servations Xi = (Xii, . . . , Xdi) such — ti\ < 2M for any p = 1, . . . ,d. 
Therefore, fj{t) and fj{t') are independent as soon as there exists a direction 
p such that — ipll > 2Af and the same holds for any statistics Z{t) and 
Z(t') based on the observations. As in (flil). adapted for d-dimensional setup: 

E 



[ {z{t) - EZ{t)) dt < ( [ z{t)dt] 

Jk \Jk ) 

< / / Cov{Z{t),Z{t'))dtdt' 



<{ f f [V{Zit))-ViZ{t'))f'l[l{\t,-t'^\<2M2-ndtdt'] 

\JKJK ) 

<\\ \ \ \y{m) + y{m'))\ n ^(i^p - < 2Ai2-^)dtdt' 

< (4M)'*/22-^'^/2 [J ViZ{t))dt^ ' . 



Denoting Zj^kit) = exp(7r /c Aj (t) /2) cos(7r/c/j (t)), it follows 

N 

E{\S2ii^)\) < Y.^u{^)E{\ {Z,,k{t)~E{Z,,u{t)))dt\) 
k=i •' 

N 

< (4M)'^/22-^"^/2|]K| VlcMI sup 1-1/2 (5 4) 

fe=l *eK 
N 

E{\\S2\\oo) < {4Mf^'2-'''/^\K\y2sup\ck{i^)\supV'^'{Z,.kit)) (5.5) 

K — \ 

N 

E{\\S2h) < (4M)'^/'2-^''/'|K| ^||cfe(.)||2 supyi/2(^^.^(^)) (5 g) 
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We state now the following lemma which is proved in the next section. 

Lemma 5.2. Let N be a positive integer, f belongs to J-{K,m*) and fj be the 
wavelet estimator constructed in \2.5\) . For k — 1, . . . , iV and for the integer j 
varying between jo and joe, 

VieK, y(e^''='^'(*)/2cos(7rfc/,(i))) < {u,, + tt^ X^^ {t)) e^" 

where Un is given in Lemma \5.1[ 



Direct application of Lemma 15.21 with the bound Xj(t)^ < 7 (^r) combined 
with (lOl) leads to 



E{\S2{'^)\) 
< (4M)''/22-id/2 



N 



■ N 1/2 . , . 

1.2 L 



.fe=l 



k=k„ 



< {4Mf/^2'^^/^ \K\ 4 lul/^N-'e^"^"^"^'^/^ + TTX,{t) log TV e-'^'^'^*)/^ 

TT L 



n log^exp — TV— ). 



By (153)1 and (|5?6)) . we obtain the same bound for i;(||52||oo) and £'(||S'2||2)/|]K|. 



5.1.5. Stochastic term due to the estimation of the variance 
Let A2(f) = mm{^{t),j2^'^/n}. We get 

\Siiiy)\ < [ \A^jit,j^)~ AN,jit,i^) dt 

< Efl^fc(^)l/ exp(^2^2^(t)/2)-exp(7r2fc2A2(t)/2) 



k=l 

N 



dt 



< 



k=l 

N 



E( I^^MI^ /jA2(i)-Af(t)|exp(^2fc2A,,„(t)/2)dt 



k=l 



< 2^ / |A2(t)-A^^(t)|exp(^2^2^,-^(^)/2^ 



< 2JV exp I ^TV2 2^ ) I \\yt)~X^Jt)\dt 



(5.7) 



where Aj_„(t) is an intermediate point between A|(i) and A^(t) and therefore 
|Aj,„(i)| < j2^'^/n. We note that a rough bound like |A2(t) - A2(t)| < 72^''/n is 
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too large; thus, we take r > and we split the expected value and for g = 1, 2, 
we have 



< T + 



7^ 



3'^" {|A2(t)-A2(t)l>r}^ 

X dP{x) 



where P is the probability associated with the variable |A|(t) — A^(t)|. It follows 
that 



E{\m)~\m) < r + 7 



2J 



d /■7 



< T 



n 

+ 7^p(|A^(t)-A^^(t)|>r) (5. 



dP(x) 



since |A|(t) — A^(t)| < |A^(t) — A|(t)|. We need an evaluation of the accuracy of 
the estimation of the variance of the estimate of /. The following lemma gives 
deviations for the error of estimation. For the proof, we refer to (|25l). 

Lemma 5.3. Let f belong to J-{m*) and t he in K. Assume that there exists 
a positive constant m such that i\2.6^ is satisfied. Then, for r > 2^^n~^, the 



estimator A| (t) in |^.7| j is such that 
P{m-X^{t)\>r) < c' \[^)'^ 



exp ■ 



n 



2jd 

n 



for some constants c, c' > 0. 
We use Lemma [Ol with 



/2id\3/2 

ai—j (logn)i/^ a>0 



(which is larger than to give an upper bound for 



E{\X^^it)-X%t)\) 
< a —] (logn)i/2 



+ -fc'a ^ 



n 



2jd 23'^ 

~'-fc' exp (~ca^ logn) + jc' exp 

n n 



23" 



nlogn 



-1/2N 



< a 



23" 



3/2 



(log n; 



,1/2 
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since j < joo and as soon as a > max{-\/l/(2c), l/(2c)}. From (|5.7p . we deduce 

E{\SM\) < 2\K\aN exp(^N'—\( — \ (logn)!/^ (5.9) 
and the same bound is valid for i?(||5i||oo)- Observing that 

EiWSih) < V f ||cfe(.)||2i? / exp(7r2fc2^(t)/2)-exp(^2fc2A2(t)/2) 



fc=l 

< 2Ne^p(^N^—] j E{\X]{t)~\^^{t)\)dt (5.10) 

K 



2 n 

and we obtain the same bound for i?(||5'i||2). 

5.2. Proofs of the lemmas 

5.2.1. Proof of Lemma \5.1\ 

Let t e K be fixed. Denote K{k,j) = e'^^^^^V'^. Recall that \^^{t) = V{fj{t)), 
put 

_ f,{t)-E,f{t) 
- A, 

and write 

cos(7rfc/j(i)) = cos {■KkEjf{t) + TrfcAj Xj) • 
Expand using the formula of cos (a + h) 

z = K{k,j)E{cos{Trkfj{t))) -cos{TTkEjf{t)) 

= [ii:(A:, j) (cos (TrfcAj Xj)) ~ 1] cos{TrkE j f{t)) 
- i4:(fc, j) (sin (tt/cAj Xj)) sm{TTkEjf(t)). 

Observe that for x a standard gaussian variable 

K{k,j) E (cos (TrfcAjx)) = 1 and K{k,j) E (sin (TrfcAjx)) = 0. 

We use an approximation for the law of Xj as n grows to oo. Denote ^V(o,i) 
and F^. the distribution functions of x and Xj- It follows 



z — K{k,j)cos{TrkEjf{t)) J cos {TrkXj x) {F^. — F^(^Q^ii){x)dx 

- K{k,j) sm{TTkEjf{t)) / sin {irkX^ x) [F^^ - F^(^as)){x)dx 
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Notice that 



1 



for 



Straightforward computations lead to 



E{Z,^^{tf) = - 
n 



and 



E 



< c 



for c = (2M)3'i||0|j^||(/)||2||(?!)|||||/|ioo. Using the technical assumption ^M) and 
the fact that there exists m* > such that infjgK f{t) > m*, the variance Xj{t) 
is bounded from below as follows 

X'^it) > iinf2^-^(m*5]02(2.i_^)_2-^-^((2Af)'^||0||L||/||oo)M 

\ / / 



> 



n 



Finally, we get 



2Jf 

71 



(5.11) 



for c is depending on m, m*, (/). Let us recall Esseen's inequality 

Proposition 5.1 (Theorem 2.6, Hall (9)). Let {Z; „, 1 < i < n} be a triangular 
array of independent variables, centered such that -^i^fn) = 1- there 

exists some A„ ^0 as n oo such that 

n 



then, there exists a positive universal constant a such that 



Vx e R, 



P[Y.^^^r.<x] -F^(o,i)(a^)) 



< 



1 + x2 
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We apply this inequality for A„ = (2-''^n ^)^/^, see (|5.1ip . and for each fixed 
t G K. We get, for j, n large enough 



d\ 1/2 



implying that 



\z\ < TTbK{k,j) 



21" 



5. 2. 2. Proof of Lemma \5.2\ 



We proceed as in the proof of Lemma |5. II Indeed, we have proved there, that 
for aU fc = 1, . . . ,7V and t e IK 

K{k,j)E{cos{TTkf,{t))) - cos{TTkE,f{t))\ < K{k,j)u„, 

where < b2^'^n^^/^ tends to when n oo. Moreover this entails 

K{k,jfE^{cos{7rkfj{t)))-cos^{TTkEjf{t)) <K{k,jfun. (5.12) 

For the first term of the needed variance, we have 



K{k,jfE{cos^{7rkf,{t))) - ^K{k,jf - K(k,jfcos{2TikE,f(t))) 



1 

< - 
- 2 



K{k,3fE{cos{2nkf,{t))) - Kik,jfco8i2nkE,f{t)) < -i^(fc,jf 



The proof of this last inequality is similar to the proof of Lemma 15.11 Together 
with ((5?T2)) . we get 



V 



(K(k,j)cos{nkf,{t))) 



< 



K{k,jfE(cos\nkf,{t))) - ]^K(k,jf - ]^K{k,jfcos{2nkE,f[t))) 



+ 



K{k,jfE^{cOs{TTkfj{t))) ~ COS^{TTkEjf{t)) 



1 1 



-Kik,jy + -K{k,j)-'cos{2iikEjf{t)) - ^- + -cos{2^kE,f{t)) 

< ^K{k, jf u„ + ^\ {K{k, jf - cos{2TTkE,f{t))) (1 - K{k, j)-^) \ 

< (3u„/2 + 7r2fc2A2)e'^''='^'. 

Note that u„ is the dominant term for k smaller than n^/^2^-''*/^ and n^k^X'j is 
dominant for k larger than the same value. 
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